The Cascade-Correlation Learning Architecture
نویسندگان
چکیده
Cascade-Correlation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure. Once a new hidden unit has been added to the network, its input-side weights are frozen. This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors. The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network. This research was sponsored in part by the National Science Foundation under Contract Number EET-8716324 and by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4976 under Contract F33615-87-C-1499 and monitored by: Avionics Laboratory, Air Force Wright Aeronautical Laboratories, Aeronautical Systems Division (AFSC), Wright-Patterson AFB, OH 45433-6543. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the US Government. 1. Why is Back-Propagation Learning So Slow? The Cascade-Correlation learning algorithm was developed in an attempt to overcome certain problems and limitations of the popular back-propagation (or “backprop”) learning algorithm [Rumelhart, 1986]. The most important of these limitations is the slow pace at which backprop learns from examples. Even on simple benchmark problems, a back-propagation network may require many thousands of epochs to learn the desired behavior from examples. (An epoch is defined as one pass through the entire set of training examples.) We have attempted to analyze the reasons why backprop learning is so slow, and we have identified two major problems that contribute to the slowness. We call these the step-size problem and the moving target problem. There may, of course, be other contributing factors that we have not yet identified. 1.1. The Step-Size Problem The step-size problem occurs because the standard back-propagation method computes only @E=@w, the partial first derivative of the overall error function with respect to each weight in the network. Given these derivatives, we can perform a gradient descent in weight space, reducing the error with each step. It is straightforward to show that if we take infinitesimal steps down the gradient vector, running a new training epoch to recompute the gradient after each step, we will eventually reach a local minimum of the error function. Experience has shown that in most situations this local minimum will be a global minimum as well, or at least a “good enough” solution to the problem at hand. In a practical learning system, however, we do not want to take infinitesimal steps; for fast learning, we want to take the largest steps that we can. Unfortunately, if we choose a step size that is too large, the network will not reliably converge to a good solution. In order to choose a reasonable step size, we need to know not just the slope of the error function, but something about its higher-order derivatives—its curvature—in the vicinity of the current point in weight space. This information is not available in the standard back-propagation algorithm. A number of schemes have been proposed for dealing with the step-size problem. Some form of “momentum” [Rumelhart, 1986] is often used as a crude way of summarizing the slope of the error surface at earlier points in the computation. Conjugate gradient methods have been explored in the context of artificial neural networks by a number of researchers [Watrous, 1988, Lapedes, 1987, Kramer, 1989] with generally good results. Several schemes, for example [Franzini, 1987] and jacobs:dynamic, have been proposed that adjust the step-size dynamically, based on the change in gradient from one step to another. Becker and LeCun [Becker, 1988] explicitly compute an approximation to the second derivative of the error function at each step and use that information to guide the speed of descent. Fahlman’s quickprop algorithm [Fahlman, 1988] is one of the more successful algorithms for handling the step-size problem in back-propagation systems. Quickprop computes the @E=@w values just as in standard backprop, but instead of simple gradient descent, Quickprop uses a second-order method, related to Newton’s method, to update the weights. On the learning benchmarks we have collected, quickprop consistently out-performs other backprop-like algorithms, sometimes by a large factor. Quickprop’s weight-update procedure depends on two approximations: first, that small changes in one weight have relatively little effect on the error gradient observed at other weights; second, that the error function with respect to each weight is locally quadratic. For each weight, quickprop keeps a copy of
منابع مشابه
Performance Analysis of the Improved Cascade Correlation Neural Network for Time Series Analysis
The cascade correlation neural network structure is used to predict the daily closing price of the particular share item in the stock market. The primary task of any neural network architecture is to reduce the error between the actual outcome of the data and expected value obtained through the network architecture. In conventional cascade correlation neural network (CCNN), modifications of wei...
متن کاملVariations on the Cascade-Correlation Learning Architecture for Fast Convergence in Robot Control
Most applications of Neural Networks in Control Systems use a version of the BackPropagation algorithm for training. Learning in these networks is generally a slow and very time consuming process. Cascade-Correlation is a supervised learning algorithm that automatically determines the size and topology of the network and is quicker than back-propagation in learning for several benchmarks. We pr...
متن کاملThe Recurrent Cascade-Correlation Architecture
Recurrent Cascade-Correlation (RCC) is a recurrent version of the Cascade-Correlation learning architecture of Fahlman and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network one at a time, as they are needed during training. In effect, the network builds up a fi...
متن کاملBuilding MLP Networks by Construction
We introduce two new models which are obtained through the modification of the well known methods MLP and cascade correlation. These two methods differ fundamentally as they employ learning techniques and produce network architectures that are not directly comparable. We extended the MLP architecture, and reduced the constructive method to obtain very comparable network architectures. The great...
متن کاملA Parallel and Modular Multi - Sieving Neural Network Architecture for Constructive Learning
In this paper we present a parallel and modular multi-sieving neural network (PMSN) architecture for constructive learning. This PMSN architecture is dierent from existing constructive learning networks such as the cascade correlation architecture. The constructing element of the PMSNs is a compound modular network rather than a hidden unit. This compound modular network is called a sieving mod...
متن کاملApplication of combined genetic algorithms with cascade correlation to diagnosis of delayed gastric emptying from electrogastrograms.
The current standard method (radioscintigraphy) for the diagnosis of delayed gastric emptying (GE) of a solid meal involves radiation exposure and considerable expense. Based on combining genetic algorithms with the cascade correlation learning architecture, a neural network approach is proposed for the diagnosis of delayed GE from electrogastrograms (EGGs). EGGs were measured by placing surfac...
متن کامل